Bypass the softmax pytorch kernel that upscales to fp32 by telgamal-1 · Pull Request #19865 · pytorch/executorch

telgamal-1 · 2026-05-29T00:16:54Z

Summary: Bypasses the PyTorch softmax kernel in static_attention that upscales activations to fp32, keeping the softmax computation in fp16. Also updates norm.py to handle the fp16 softmax output.

Differential Revision: D106729898

Summary: Bypasses the PyTorch softmax kernel in `static_attention` that upscales activations to fp32, keeping the softmax computation in fp16. Also updates `norm.py` to handle the fp16 softmax output. Differential Revision: D106729898

pytorch-bot · 2026-05-29T00:16:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19865

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

/easycla is not responding

❌ 1 Awaiting Approval, 2 Unrelated Failures, 6 Unclassified Failures

As of commit 4260f2a with merge base 42581f1 ():

AWAITING APPROVAL - The following workflow needs approval before CI can run:

Lint (gh)

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Linux Wheels / pytorch/executorch / build-manywheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
TypeError: dataclass() got an unexpected keyword argument 'slots'
Build Linux Wheels / pytorch/executorch / upload / upload-manywheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x86_64
Build macOS Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
TypeError: dataclass() got an unexpected keyword argument 'slots'
Build macOS Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_
Build Windows Wheels / pytorch/executorch / build-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
TypeError: dataclass() got an unexpected keyword argument 'slots'
Build Windows Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_x64

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-05-29T00:17:01Z

❌ - login: @telgamal-1 / name: Tarek Elgamal. The commit (4260f2a) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

meta-codesync · 2026-05-29T00:17:03Z

@telgamal-1 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106729898.

github-actions · 2026-05-29T00:17:48Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 29, 2026

meta-codesync Bot added fb-exported meta-exported labels May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bypass the softmax pytorch kernel that upscales to fp32#19865

Bypass the softmax pytorch kernel that upscales to fp32#19865
telgamal-1 wants to merge 1 commit into
pytorch:mainfrom
telgamal-1:export-D106729898

telgamal-1 commented May 29, 2026

Uh oh!

pytorch-bot Bot commented May 29, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented May 29, 2026

Uh oh!

meta-codesync Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

telgamal-1 commented May 29, 2026

Uh oh!

pytorch-bot Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19865

❗ 1 Active SEVs

❌ 1 Awaiting Approval, 2 Unrelated Failures, 6 Unclassified Failures

Uh oh!

linux-foundation-easycla Bot commented May 29, 2026

Uh oh!

meta-codesync Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented May 29, 2026 •

edited

Loading

This PR needs a `release notes:` label